Inducing Controlled Error over Variable Length Ranked Lists
نویسندگان
چکیده
When examining the robustness of systems that take ranked lists as input, we can induce noise, measured in terms of Kendall’s tau rank correlation, by applying a set number of random adjacent transpositions. The set number of random transpositions ensures that any ranked lists, induced with this noise, has a specific expected Kendall’s tau. However, if we have ranked lists of varying length, it is not clear how many random transpositions we must apply to each list to ensure that we obtain a consistent expected Kendall’s tau across the collection. In this article we investigate how to compute the number of random adjacent transpositions required to obtain an expected Kendall’s tau for a given list length, and find that it is infeasible to compute for lists of length more than 9. We also investigate an alternate and more efficient method of inducing noise in ranked lists called Gaussian Perturbation. We show that using this method, we can compute the parameters required to induce a consistent level of noise for lists of length 10 in just over six minutes. We also provide an approximate solution to provide results in less than 10−5 seconds.
منابع مشابه
Efficient motif search in ranked lists and applications to variable gap motifs
Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on -measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their c...
متن کاملCLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists
We participated in two tasks: Multi-8 two-years-on retrieval and Multi-8 results merging. For our multi-8 two-years-on retrieval work, simple multilingual ranked lists are first built by merging ranked lists of different languages that are generated by single types of retrieval algorithms. Then, algorithms are proposed to combine these simple multilingual ranked lists into a single ranked list....
متن کاملStability of Ranked Gene Lists in Large Microarray Analysis Studies
This paper presents an empirical study that aims to explain the relationship between the number of samples and stability of different gene selection techniques for microarray datasets. Unlike other similar studies where number of genes in a ranked gene list is variable, this study uses an alternative approach where stability is observed at different number of samples that are used for gene sele...
متن کاملEvaluation of Personalized Concept-Based Search and Ranked Lists over Linked Open Data
Linked Open Data (LOD) provides a rich structured data. As the size of LOD grows, accessing the right information becomes more challenging. Especially, the commonly used ranked lists presentation of current LOD search engines is not effective for search tasks in unfamiliar domains. Recently, combination of clustering and personalized search gained more attention for this purpose. In this paper,...
متن کاملVLSI implementation of a reversible variable length encoder/decoder
Variable Length Codes (VLCs) are known for their efficient compression , but are susceptible to noisy environments due to synchronization losses that can occur from bit error propagation. Recent interest in Reversible Variable Length Codes (RVLCs) has come about due to the growing need for wireless exchange of compressed image and video signals over noisy channels and the ability for RVLCs to p...
متن کامل